Design Instagram

A comprehensive system design for Instagram, covering architecture, scalability, and key features.

Problem Statement
Requirements
Capacity Estimation
System APIs
Database Design
High-Level Design
Component Design
Data Partitioning
Caching Strategy
Load Balancing
Security & Privacy
Monitoring & Analytics

Problem Statement

Design a photo and video sharing social networking service similar to Instagram where users can:

Upload photos and videos
Follow other users
View a personalized feed of photos/videos from followed users
Like and comment on posts
Search for users, photos, and hashtags
Receive notifications

Requirements

Functional Requirements

User Management
- User registration and authentication
- User profiles (bio, profile picture, follower/following counts)
- Follow/unfollow users
Content Management
- Upload photos and videos
- Apply filters to photos
- Add captions, hashtags, and location tags
- Delete posts
Feed Generation
- Personalized feed showing posts from followed users
- Chronological and algorithm-based feeds
- Infinite scroll
Social Features
- Like posts
- Comment on posts
- Share posts
- Save posts
Search
- Search users by username
- Search posts by hashtags
- Search posts by location
Notifications
- Real-time notifications for likes, comments, follows
- Push notifications

Non-Functional Requirements

Scalability: Support 1 billion users, 500 million daily active users
Availability: 99.9% uptime
Performance:
- Feed generation: < 200ms
- Image upload: < 3 seconds
- Image retrieval: < 100ms
Reliability: No data loss, handle failures gracefully
Consistency: Eventual consistency is acceptable for social features

Capacity Estimation

Traffic Estimates

Total users: 1 billion
Daily active users (DAU): 500 million
Posts per day: 100 million (20% of DAU post daily)
Average posts per user per day: 2
Read-to-write ratio: 100:1 (users view more than they post)
Feed views per day: 5 billion (10 feeds per DAU)
Likes per day: 5 billion (10 likes per DAU)
Comments per day: 500 million (1 comment per DAU)

Storage Estimates

Photo Storage:

Average photo size: 200 KB
Photos per day: 100 million
Daily storage: 100M × 200 KB = 20 TB/day
Yearly storage: 20 TB × 365 = 7.3 PB/year
With 3x replication: 21.9 PB/year
5 years retention: ~110 PB

Video Storage:

Average video size: 3 MB (15 seconds)
Videos per day: 20 million (20% of posts)
Daily storage: 20M × 3 MB = 60 TB/day
Yearly storage: 60 TB × 365 = 21.9 PB/year
With 3x replication: 65.7 PB/year
5 years retention: ~330 PB

Metadata Storage:

Post metadata: ~1 KB per post
100M posts/day × 1 KB = 100 GB/day
Yearly: 100 GB × 365 = 36.5 TB/year
With 3x replication: 110 TB/year
5 years: ~550 TB

Total Storage: ~440 PB (with replication and 5 years retention)

Bandwidth Estimates

Upload:

100M posts/day × 200 KB = 20 TB/day
Average: 20 TB / 86,400s = 231 MB/s = 1.85 Gbps
Peak (5x average): ~9 Gbps

Download:

5B feed views/day × 200 KB = 1 PB/day
Average: 1 PB / 86,400s = 11.6 TB/s = 92.8 Tbps
Peak (5x average): ~464 Tbps

CDN will handle most download traffic, reducing origin server load.

System APIs

RESTful API Design

// User APIs
POST   /api/v1/users/register
POST   /api/v1/users/login
GET    /api/v1/users/{userId}
PUT    /api/v1/users/{userId}
DELETE /api/v1/users/{userId}

// Follow APIs
POST   /api/v1/users/{userId}/follow
DELETE /api/v1/users/{userId}/follow
GET    /api/v1/users/{userId}/followers
GET    /api/v1/users/{userId}/following

// Post APIs
POST   /api/v1/posts
GET    /api/v1/posts/{postId}
DELETE /api/v1/posts/{postId}
GET    /api/v1/users/{userId}/posts

// Feed APIs
GET    /api/v1/feed?page={page}&limit={limit}

// Like APIs
POST   /api/v1/posts/{postId}/like
DELETE /api/v1/posts/{postId}/like
GET    /api/v1/posts/{postId}/likes

// Comment APIs
POST   /api/v1/posts/{postId}/comments
GET    /api/v1/posts/{postId}/comments
DELETE /api/v1/comments/{commentId}

// Search APIs
GET    /api/v1/search/users?q={query}
GET    /api/v1/search/posts?q={query}&type={hashtag|location}

// Notification APIs
GET    /api/v1/notifications
PUT    /api/v1/notifications/{notificationId}/read

Example API Request/Response

Upload Post:

POST /api/v1/posts
Request:
{
  "userId": "123",
  "imageUrl": "https://cdn.instagram.com/images/abc123.jpg",
  "caption": "Beautiful sunset #sunset #nature",
  "location": {"lat": 40.7128, "lng": -74.0060},
  "filters": ["vintage"]
}

Response:
{
  "postId": "post_456",
  "status": "success",
  "timestamp": "2024-01-15T10:30:00Z"
}

Get Feed:

GET /api/v1/feed?page=1&limit=20

Response:
{
  "posts": [
    {
      "postId": "post_123",
      "userId": "user_456",
      "username": "johndoe",
      "imageUrl": "https://cdn.instagram.com/images/abc123.jpg",
      "caption": "Beautiful sunset",
      "likes": 1250,
      "comments": 45,
      "timestamp": "2024-01-15T09:00:00Z",
      "isLiked": false
    }
  ],
  "nextPage": 2,
  "hasMore": true
}

Database Design

Data Models

User Table

CREATE TABLE users (
    user_id BIGINT PRIMARY KEY,
    username VARCHAR(255) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    full_name VARCHAR(255),
    bio TEXT,
    profile_picture_url VARCHAR(512),
    follower_count INT DEFAULT 0,
    following_count INT DEFAULT 0,
    post_count INT DEFAULT 0,
    is_verified BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_username (username),
    INDEX idx_email (email)
);

Post Table

CREATE TABLE posts (
    post_id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    image_url VARCHAR(512) NOT NULL,
    video_url VARCHAR(512),
    caption TEXT,
    location_lat DECIMAL(10, 8),
    location_lng DECIMAL(11, 8),
    like_count INT DEFAULT 0,
    comment_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES users(user_id),
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at),
    INDEX idx_location (location_lat, location_lng)
);

Follow Table

CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id),
    FOREIGN KEY (follower_id) REFERENCES users(user_id),
    FOREIGN KEY (followee_id) REFERENCES users(user_id),
    INDEX idx_follower (follower_id),
    INDEX idx_followee (followee_id)
);

Like Table

CREATE TABLE likes (
    like_id BIGINT PRIMARY KEY,
    post_id BIGINT NOT NULL,
    user_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE KEY unique_like (post_id, user_id),
    FOREIGN KEY (post_id) REFERENCES posts(post_id),
    FOREIGN KEY (user_id) REFERENCES users(user_id),
    INDEX idx_post_id (post_id),
    INDEX idx_user_id (user_id)
);

Comment Table

CREATE TABLE comments (
    comment_id BIGINT PRIMARY KEY,
    post_id BIGINT NOT NULL,
    user_id BIGINT NOT NULL,
    parent_comment_id BIGINT,
    content TEXT NOT NULL,
    like_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (post_id) REFERENCES posts(post_id),
    FOREIGN KEY (user_id) REFERENCES users(user_id),
    FOREIGN KEY (parent_comment_id) REFERENCES comments(comment_id),
    INDEX idx_post_id (post_id),
    INDEX idx_user_id (user_id)
);

Hashtag Table

CREATE TABLE hashtags (
    hashtag_id BIGINT PRIMARY KEY,
    tag VARCHAR(255) UNIQUE NOT NULL,
    post_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_tag (tag)
);

CREATE TABLE post_hashtags (
    post_id BIGINT NOT NULL,
    hashtag_id BIGINT NOT NULL,
    PRIMARY KEY (post_id, hashtag_id),
    FOREIGN KEY (post_id) REFERENCES posts(post_id),
    FOREIGN KEY (hashtag_id) REFERENCES hashtags(hashtag_id),
    INDEX idx_hashtag_id (hashtag_id)
);

Notification Table

CREATE TABLE notifications (
    notification_id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    type ENUM('like', 'comment', 'follow', 'mention') NOT NULL,
    actor_id BIGINT,
    post_id BIGINT,
    comment_id BIGINT,
    is_read BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES users(user_id),
    INDEX idx_user_id (user_id),
    INDEX idx_user_unread (user_id, is_read, created_at)
);

Database Sharding Strategy

Shard by User ID:

Use consistent hashing to distribute users across shards
User data, posts, and relationships stored on same shard when possible
Cross-shard queries for feed generation (handled by feed service)

Shard by Post ID:

Posts sharded separately for better distribution
Enables parallel processing of popular posts

High-Level Design

┌─────────────┐
│   Client    │
│  (Mobile/   │
│    Web)     │
└──────┬──────┘
       │
       │ HTTPS
       │
┌──────▼─────────────────────────────────────────────┐
│              Load Balancer                         │
│         (Application Load Balancer)               │
└──────┬─────────────────────────────────────────────┘
       │
       ├─────────────────┬─────────────────┐
       │                 │                 │
┌──────▼──────┐   ┌──────▼──────┐   ┌──────▼──────┐
│   API       │   │   API       │   │   API       │
│  Server 1   │   │  Server 2   │   │  Server N   │
└──────┬──────┘   └──────┬──────┘   └──────┬──────┘
       │                 │                 │
       └─────────────────┴─────────────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
    ┌────▼────┐     ┌────▼────┐    ┌────▼────┐
    │  Cache  │     │  Cache  │    │  Cache  │
    │ (Redis) │     │ (Redis) │    │ (Redis) │
    └────┬────┘     └────┬────┘    └────┬────┘
         │               │               │
         └───────────────┼───────────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
    ┌────▼────┐     ┌────▼────┐    ┌────▼────┐
    │Database │     │Database │    │Database │
    │ Shard 1 │     │ Shard 2 │    │ Shard N │
    └─────────┘     └─────────┘    └─────────┘
         │               │               │
         └───────────────┼───────────────┘
                         │
              ┌──────────▼──────────┐
              │   Object Storage    │
              │   (S3/CDN)          │
              │   - Photos          │
              │   - Videos          │
              └─────────────────────┘

Core Services

User Service: Handles user registration, authentication, profiles
Post Service: Manages post creation, retrieval, deletion
Feed Service: Generates personalized feeds
Social Graph Service: Manages follow relationships
Like/Comment Service: Handles likes and comments
Search Service: Handles search functionality
Notification Service: Manages notifications
Media Service: Handles image/video upload and processing

Component Design

Feed Generation Service

Challenge: Generate personalized feed for 500M DAU efficiently.

Approach 1: Pre-computed Feed (Fan-out on Write)

When user posts:
1. Get all followers (fan-out)
2. Insert post into each follower's feed cache
3. Feed retrieval is O(1) - just read from cache

Pros:
- Fast feed retrieval
- Real-time updates

Cons:
- Slow for users with millions of followers
- High write amplification

Approach 2: On-demand Feed (Fan-out on Read)

When user requests feed:
1. Get list of followed users
2. Fetch recent posts from each
3. Merge and rank posts
4. Return top N posts

Pros:
- No write amplification
- Works for celebrity accounts

Cons:
- Slower feed generation
- Higher read load

Hybrid Approach (Recommended):

- Pre-compute feeds for users with < 1000 followers
- On-demand for users with > 1000 followers
- Use background workers to pre-compute feeds asynchronously

Feed Generation Algorithm:

def generate_feed(user_id, page=1, limit=20):
    # Get followed users
    followed_users = get_followed_users(user_id)
    
    # Get recent posts from followed users
    posts = []
    for followee_id in followed_users:
        recent_posts = get_recent_posts(followee_id, limit=100)
        posts.extend(recent_posts)
    
    # Rank posts by:
    # - Recency (70% weight)
    # - Engagement (likes + comments) (20% weight)
    # - User relationship strength (10% weight)
    ranked_posts = rank_posts(posts)
    
    # Paginate
    start = (page - 1) * limit
    end = start + limit
    return ranked_posts[start:end]

Media Upload Service

Flow:

Client uploads image/video to API server
API server validates file (size, type)
API server uploads to object storage (S3)
Trigger image processing service:
- Generate thumbnails (multiple sizes)
- Apply filters if requested
- Extract metadata
Store metadata in database
Return post ID to client

Image Processing Pipeline:

Original Image (2MB)
    ↓
[Resize] → Thumbnail (50KB)
[Resize] → Medium (200KB)
[Resize] → Large (500KB)
[Filter] → Apply user-selected filter
    ↓
Store in CDN

Search Service

Requirements:

Search users by username
Search posts by hashtags
Search posts by location

Implementation:

Use Elasticsearch for full-text search
Index users, posts, hashtags
Support fuzzy matching for usernames
Geospatial search for location-based queries

Index Structure:

// User Index
{
  "user_id": "123",
  "username": "johndoe",
  "full_name": "John Doe",
  "bio": "Photographer"
}

// Post Index
{
  "post_id": "456",
  "caption": "Beautiful sunset #sunset",
  "hashtags": ["sunset", "nature"],
  "location": {"lat": 40.7128, "lng": -74.0060}
}

Data Partitioning

Partitioning Strategy

1. User Data Sharding

Shard by user_id using consistent hashing
Each shard contains user profile and metadata
Enables efficient user lookups

2. Post Data Sharding

Shard by post_id using consistent hashing
Distributes load evenly
Enables parallel processing

3. Social Graph Sharding

Shard by follower_id
All follow relationships for a user on same shard
Efficient for getting followers/following lists

4. Feed Data Sharding

Shard by user_id (feed owner)
Pre-computed feeds stored per user
Fast feed retrieval

Handling Hot Partitions

Problem: Celebrity accounts with millions of followers create hot partitions.

Solutions:

Separate celebrity feeds: Use on-demand generation instead of pre-computation
Rate limiting: Limit writes to celebrity feed caches
Caching: Aggressively cache celebrity posts
Separate infrastructure: Dedicated servers for high-traffic accounts

Caching Strategy

Cache Layers

1. Application Cache (Redis)

User profiles: 24-hour TTL
Post metadata: 1-hour TTL
Feed data: 5-minute TTL
Like/comment counts: 15-minute TTL

2. CDN Cache

Images and videos: Long TTL (1 year)
Static assets: Long TTL
Reduces origin server load

3. Database Query Cache

Frequently accessed queries cached
Invalidate on updates

Cache Invalidation

# When post is updated
def update_post(post_id, data):
    # Update database
    db.update_post(post_id, data)
    
    # Invalidate caches
    cache.delete(f"post:{post_id}")
    cache.delete(f"user:{user_id}:posts")
    # Invalidate feed caches for all followers
    followers = get_followers(user_id)
    for follower_id in followers:
        cache.delete(f"feed:{follower_id}")

Cache Patterns

Write-through: Write to cache and database simultaneously Write-behind: Write to cache, async write to database Cache-aside: Application manages cache, writes to DB, then cache

For Instagram:

Read-heavy: Use cache-aside for reads
Write-heavy: Use write-behind for posts (async processing)

Load Balancing

Load Balancing Strategy

1. DNS Load Balancing

Distribute requests across multiple data centers
Geographic routing for lower latency

2. Application Load Balancer

Round-robin or least connections
Health checks for server instances
SSL termination

3. Database Load Balancing

Read replicas for read queries
Primary database for writes
Connection pooling

Load Balancer Algorithms

Round Robin: Distribute requests evenly
Least Connections: Route to server with fewest connections
IP Hash: Route based on client IP (session affinity)
Weighted: Assign weights based on server capacity

For Instagram:

Use least connections for API servers
Use IP hash for stateful sessions
Use weighted for heterogeneous servers

Security & Privacy

Authentication & Authorization

Authentication:

JWT tokens for stateless authentication
Token refresh mechanism
OAuth for third-party login

Authorization:

Role-based access control (RBAC)
Resource-level permissions
Rate limiting per user

Data Security

Encryption:

TLS/SSL for data in transit
Encryption at rest for sensitive data
Encrypted database connections

Privacy:

Private accounts (approve followers)
Block/unblock users
Report inappropriate content
GDPR compliance (data deletion)

Security Measures

Input Validation: Sanitize all user inputs
SQL Injection Prevention: Use parameterized queries
XSS Prevention: Sanitize output
CSRF Protection: Token-based validation
Rate Limiting: Prevent abuse
DDoS Protection: Cloud-based protection services

Monitoring & Analytics

Key Metrics

System Metrics:

Request rate (QPS)
Response time (p50, p95, p99)
Error rate
Server CPU, memory, disk usage
Database connection pool usage

Business Metrics:

Daily active users (DAU)
Monthly active users (MAU)
Posts per day
Engagement rate (likes, comments)
User retention

Logging

Structured Logging:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "service": "feed-service",
  "user_id": "123",
  "action": "feed_request",
  "duration_ms": 150,
  "post_count": 20
}

Log Aggregation:

Centralized logging (ELK stack, Splunk)
Real-time monitoring and alerting
Log retention policies

Alerting

Critical Alerts:

High error rate (> 1%)
High latency (p99 > 1s)
Database connection pool exhaustion
Disk space low (< 20%)
Service downtime

Scalability Considerations

Horizontal Scaling

Stateless Services: All API servers are stateless, easy to scale
Database Sharding: Distribute data across multiple shards
Read Replicas: Scale reads independently
Caching: Reduce database load

Vertical Scaling

Database: Increase instance size for hot shards
Compute: More CPU/memory for processing services

Optimization Strategies

Database Indexing: Proper indexes on frequently queried columns
Query Optimization: Avoid N+1 queries, use joins
Async Processing: Background jobs for non-critical tasks
CDN: Serve static content from edge locations
Compression: Compress API responses (gzip)
Pagination: Limit result sets

Failure Handling

Single Point of Failure

Eliminate SPOFs:

Multiple API server instances
Database replication (master-slave)
Multiple cache instances
Redundant load balancers
Multiple data centers

Failure Scenarios

1. Database Failure

Automatic failover to replica
Data replication for recovery
Regular backups

2. Cache Failure

Graceful degradation (serve from database)
Multiple cache instances
Cache warming strategies

3. Service Failure

Health checks and auto-restart
Circuit breakers
Fallback mechanisms

4. Data Center Failure

Multi-region deployment
Automatic traffic routing
Data replication across regions

Advanced Features

Stories (24-hour content)

Separate storage for temporary content
Auto-deletion after 24 hours
Higher write rate, lower retention

Reels (Short videos)

Video processing pipeline
Video transcoding (multiple qualities)
Higher storage and bandwidth requirements

Live Streaming

Real-time video streaming
WebRTC or HLS streaming
Separate infrastructure for live content

Direct Messaging

Real-time messaging (WebSocket)
Message queue for delivery
End-to-end encryption option

Technology Stack

Backend

Languages: Python (Django/Flask), Java (Spring), Go
Databases: PostgreSQL, MySQL (sharded)
Cache: Redis, Memcached
Message Queue: Kafka, RabbitMQ
Search: Elasticsearch
Object Storage: AWS S3, Google Cloud Storage

Frontend

Mobile: React Native, Swift (iOS), Kotlin (Android)
Web: React, Next.js

Infrastructure

Cloud: AWS, GCP, Azure
CDN: CloudFront, Cloudflare
Load Balancer: AWS ALB, Nginx
Monitoring: Prometheus, Grafana, Datadog

Summary

Designing Instagram requires:

Scalable Architecture: Handle billions of users and posts
Efficient Feed Generation: Hybrid approach for optimal performance
Media Storage: CDN + object storage for images/videos
Caching Strategy: Multi-layer caching for performance
Database Sharding: Distribute data across shards
Real-time Features: WebSocket for notifications, live features
Search: Elasticsearch for fast search
Security: Authentication, authorization, encryption
Monitoring: Comprehensive logging and alerting
Failure Handling: Redundancy and graceful degradation

Key Trade-offs:

Consistency vs Availability: Eventual consistency for social features
Latency vs Freshness: Cached feeds vs real-time updates
Storage vs Cost: Compression, retention policies
Complexity vs Performance: Pre-computed vs on-demand feeds

Design Instagram

Table of Contents​

Problem Statement​

Requirements​

Functional Requirements​

Non-Functional Requirements​

Capacity Estimation​

Traffic Estimates​

Storage Estimates​

Bandwidth Estimates​

System APIs​

RESTful API Design​

Example API Request/Response​

Database Design​

Data Models​

User Table​

Post Table​

Follow Table​

Like Table​

Comment Table​

Hashtag Table​

Notification Table​

Database Sharding Strategy​

High-Level Design​

Core Services​

Component Design​

Feed Generation Service​

Media Upload Service​

Search Service​

Data Partitioning​

Partitioning Strategy​

Handling Hot Partitions​

Caching Strategy​

Cache Layers​

Cache Invalidation​

Cache Patterns​

Load Balancing​

Load Balancing Strategy​

Load Balancer Algorithms​

Security & Privacy​

Authentication & Authorization​

Data Security​

Security Measures​

Monitoring & Analytics​

Key Metrics​

Logging​

Alerting​

Scalability Considerations​

Horizontal Scaling​

Vertical Scaling​

Optimization Strategies​

Failure Handling​

Single Point of Failure​

Failure Scenarios​

Advanced Features​

Stories (24-hour content)​

Reels (Short videos)​

Live Streaming​

Direct Messaging​

Technology Stack​

Backend​

Frontend​

Infrastructure​

Summary​

Table of Contents